Context Analysis System for Japanese Text

نویسندگان

  • Hitoshi Isahara
  • Shun Ishizaki
چکیده

A natural language understanding system is described which extracts contextual information from Japanese texts. It integrates syntactic, semantic and contextual processing serially. The syntactic analyzer obtains rough syntactic structures from the text. The semantic analyzer treats modifying relations inside noun phrases and case relations among verbs and noun phrases. Then, the contextual analyzer obtains contextual information from the semantic structure extracted by the semantic analyzer. Our system understands the context using preceded contextual knowledge on terrorism and plugs the event information in input sentences into the contextual structure. i: Introduction Despite the advanced state of syntactic analysis research for natural language processing and the many useful results it has produced, there have been few studies involving contextual information, and many problems remain unsolved. The natural language understanding system described here employs a syntactic analyzer, a semantic analyzer treating modifying relations inside noun phrases and the relations among verbs and phrases, that is, word-level semantics, and a contextual analyzer (Fig. i). These analyzers operate in a serially integrated fashion. Though humans seem to understand natural language texts using these three analyzers simultaneously, we have made their methodology essentially different from their human counterparts for more efficient computing. Our system uses a context-free grammar parser named Extended-Lingol as a syntactic analyzer to analyze the Japanese sentences and produce parsing trees. From an analysis of these, in turn, it obtains word-level semantic structures expressed in frame-like representations. Finally, it extracts contextual information, using our representation from the semantic structures. We remain far from certain at this stage whether this system represents the best realization of an engineering-based natural language understanding system. Future plans include combining these three processes into one process and bringing the system closer to the human process. Because our system uses bottom-up analysis first (including syntactic analysis and word-level semantic analysis), it can obtain not only the outline of the input sentences but also their details, as necessary. This method is the best one in situations where the detailed information of texts are quite important, such as Machine-Translation systems and precise question-answering systems. Of course, in this way, we must build up a sizable dictionary of precise word definitions. In our system, predictive-style processing is not used in syntactic analysis and word-level semantic analysis. But, in the contextual analysis part, predictions from the tree structure of the contextual information are used for instantiation of the contextual structure. We are now developing a system which can understand newspaper articles through contextual structure (see Fig. 2a). After applying the procedures outlined above, the system obtains I Input sost0nces ] (Nouspa2er ar t ~cles| in Japanese)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Turning Quantitative: An Analytic Scale to Do Critical Discourse Analysis

Critical Discourse Analysis (CDA) could be seen as a theory in qualitative more than in qualitative stud- ies. This might have led to difficulty in doing CDA. Accordingly, this study attempted to develop a quan- titative profile in the form of an analytic rubric. For this purpose, Fairclough’s model of CDA was select- ed as the research framework. The techniques used for structuring analy...

متن کامل

Cherry Blossom: A System for Japanese Character Recognition

A general purpose Japanese character recognition system, Cherry Blossom, has been developed at CEDAR in past years. It is designed to recognize Japanese document images in low resolution or with poor print quality. The system includes modules for page skew correction, document segmentation, text segmentation, character recognition and postprocessing. The API code for each module has been develo...

متن کامل

Development of a Robust and Compact On-Line Handwritten Japanese Text Recognizer for Hand-Held Devices

The paper describes how a robust and compact on-line handwritten Japanese text recognizer was developed by compressing each component of an integrated text recognition system including a SVM classifier to evaluate segmentation points, an on-line and off-line combined character recognizer, a linguistic context processor, and a geometric context evaluation module to deploy it on hand-held devices...

متن کامل

CRITAC - A Japanese Text Proofreading System

CRITAC (CRITiquing using ACcumulated knowledge) is an experimental expert system for proofreading Japanese text. It detects mistypes, Kana-to-Kanji misconversions, and stylistic errors. This system combines Prolog-coded heuristic knowledge with conventional Japanese text processing techniques which involve heavy computation and access to large language databases. 1. I n t r o d u c t i o n Curr...

متن کامل

Elements of Critical Context Studies

In this paper, it is argued that Critical Discourse Studies (CDS) should be extended to also include what may be called Critical Context Studies (CCS). After a summary of the new theory of context, defined in terms of special mental models in episodic memory, subjectively representing the ‘definition of the communicative situation’ by the participants, it is argued how a critical analysis of te...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1986